Q1.

Q2.

(a)

(b)

(c)

Benefits

One hot encoding makes our training data more useful and expressive, and it can be rescaled easily. By using numeric values, we more easily determine a probability for our values. In particular, one hot encoding is used for our output values, since it provides more nuanced predictions than single labels.

Determining the state has a low and constant cost of accessing one flip-flop

Changing the state has the constant cost of accessing two flip-flops

Easy to design and modify

Easy to detect illegal states

Using an one-hot encoding typically allows a state machine to run at a faster clock rate than any other encoding of that state machine

Q3.

(a)

KNN

Here, the test error for KNN is out perform than what we have in the report (5%)

Adaboost

SVM

Here, the test error for SVM is higher than what we have in the report (1.4%). This happens may due to the choice of different random state that some train splits have a better performance than the others.

(b)

RandomForest

GridSearchCV with Logistic Regression

Using CNN

Here, by using CNN Algotithm, the test error finally out perform all three of the classifier I implemented in part 2(a)

Q4.

3. ANN

(a) & (b)

The first plot is the average training/test cross-entropy error vs number of epochs

The second plot is the classification error (in percentage) vs. number of epochs,

Difference between a&b

By comparing to those two different types of plots we can find out that for plots in (a), the validation loss starts to increase after a point. In my model at learning rate 0.1, it starts the increasing trend at the beginning of the 2nd trainning and loss gradually getting more and more during the rest training.

For the plots in (b), we will find out that the misclassification error (in percentage) eventually converged to a certain percentage, in this model it's around 2%, and it does not change much as we keep doing more trainnings

As a result, this model obtain its best prediction in around the 4th trainning.

(c)

(d)

For computational efficiency, I will only test those parameters invidually and remains the other parameters the same as what I did in part a&b instead of testing all the combinations.

learning rate of 0.01

learning rate of 0.2

learning rate of 0.5

momentum of 0.0

Which is the same as what we did in part a&b

momentum of 0.5

momentum of 0.9

4. CNN

Since CNN works with immages, we would like to have relatively small batches

a&b

Plot the loss and accuracy comparisons

As we can see that CNN converges faster than ANN and so there is no need to use that many epochs to train the data.

c

d

lr = 0.01

lr = 0.2

lr = 0.5

momentum = 0.0

momentum = 0.5

momentum = 0.9

5. Beat the performance of SVM with Gaussian Kernel

As I'm using optim.Adam as my optimizer and therefore I will skip the test for momentums

Different parameters

c

Visualize conv filter

first filter

second filter

Q5.

As I'm using optim.Adam as my optimizer and therefore I will skip the test for momentums,and also I'm using the accuracy plot instead of using the misclassification plot

(a)

The last coordinate is the label

Data Prepare

(b)

ANN

c

lr = 0.2

lr = 0.5

CNN

c

lr = 0.2

lr = 0.5